3,403 research outputs found

    A new SVD approach to optimal topic estimation

    Full text link
    In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix, masked by noise, and Singular Value Decomposition (SVD) is a potentially useful tool for learning such a matrix. However, different rows and columns of the matrix are usually in very different scales and the connection between this matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. We overcome the challenges by introducing a proper Pre-SVD normalization of the text corpus matrix and a proper column-wise scaling for the matrix of interest, and by revealing a surprising Post-SVD low-dimensional {\it simplex} structure. The simplex structure, together with the Pre-SVD normalization and column-wise scaling, allows us to conveniently reconstruct the matrix of interest, and motivates a new SVD-based approach to learning topic models. We show that under the popular probabilistic topic model \citep{hofmann1999}, our method has a faster rate of convergence than existing methods in a wide variety of cases. In particular, for cases where documents are long or nn is much larger than pp, our method achieves the optimal rate. At the heart of the proofs is a tight element-wise bound on singular vectors of a multinomially distributed data matrix, which do not exist in literature and we have to derive by ourself. We have applied our method to two data sets, Associated Process (AP) and Statistics Literature Abstract (SLA), with encouraging results. In particular, there is a clear simplex structure associated with the SVD of the data matrices, which largely validates our discovery.Comment: 73 pages, 8 figures, 6 tables; considered two different VH algorithm, OVH and GVH, and provided theoretical analysis for each algorithm; re-organized upper bound theory part; added the subsection of comparing error rate with other existing methods; provided another improved version of error analysis through Bernstein inequality for martingale

    Orthonormal Polynomials on the Unit Circle and Spatially Discrete Painlev\'e II Equation

    Full text link
    We consider the polynomials ϕn(z)=Îșn(zn+bn−1zn−1+>...)\phi_n(z)= \kappa_n (z^n+ b_{n-1} z^{n-1}+ >...) orthonormal with respect to the weight exp⁥(λ(z+1/z))dz/2πiz\exp(\sqrt{\lambda} (z+ 1/z)) dz/2 \pi i z on the unit circle in the complex plane. The leading coefficient Îșn\kappa_n is found to satisfy a difference-differential (spatially discrete) equation which is further proved to approach a third order differential equation by double scaling. The third order differential equation is equivalent to the Painlev\'e II equation. The leading coefficient and second leading coefficient of ϕn(z)\phi_n(z) can be expressed asymptotically in terms of the Painlev\'e II function.Comment: 16 page

    Instructional strategies and teacher-student interaction in the classrooms of a Chinese immersion school

    Get PDF
    unavailabl

    Riemann-Hilbert approach to multi-time processes; the Airy and the Pearcey case

    Get PDF
    We prove that matrix Fredholm determinants related to multi-time processes can be expressed in terms of determinants of integrable kernels \`a la Its-Izergin-Korepin-Slavnov (IIKS) and hence related to suitable Riemann-Hilbert problems, thus extending the known results for the single-time case. We focus on the Airy and Pearcey processes. As an example of applications we re-deduce a third order PDE, found by Adler and van Moerbeke, for the two-time Airy process.Comment: 18 pages, 1 figur

    Application Testing Under Developer Specified Device Resource Occupancy

    Get PDF
    During normal usage, consumer devices may remain switched on without a shutdown and restart for long durations of time. A lengthy period of time since the last restart can lead to high usage of device resources such as CPU, memory, storage, etc. Program performance issues as well as errors caused by these are hard to detect using clean functional test environments. This disclosure describes techniques to emulate end-user scenarios as lengthy times since last restart and high resource utilization by providing the developer with the ability to easily configure the usage of the CPU, memory, and storage of a device-under-test (DUT) via a device resources management tool. The device resources management tool is implemented such that it can invoke low level operating system APIs to occupy a specified percentage of resources such as CPU, memory, storage, etc. The extent to which each device resource is occupied can be set in an independent or combined manner. The device resources management tool enables developers to emulate various real world resource utilization scenarios and can help identify bugs that are otherwise rare and/or difficult to reproduce

    Virtual devices as a service

    Get PDF
    Software applications are developed and tested over a large and evolving variety of devices of different device types. Development and testing with physical devices is tedious and time consuming and has scaling and reliability problems. Per techniques of this disclosure, a large pool of virtual devices is instantiated on a compute cluster and made available to software developers as a service. Developers check out as many virtual devices as needed, conduct test and development activity, reset the devices, and release the devices back to the pool. The techniques obviate the need for physical devices and the concomitant issues of cost and reliability and enable large scale testing and development and faster device releases

    When corporate scandal hits retail investors close to home

    Get PDF
    People reduce their participation in the stock market after a case of corporate fraud in their state, write Mariassunta Giannetti and Tracy Yue Wan
    • 

    corecore